Reference String Extraction Using Line-Based Conditional Random Fields

نویسنده

  • Martin Körner
چکیده

The extraction of individual reference strings from the reference section of scientific publications is an important step in the citation extraction pipeline. Current approaches divide this task into two steps by first detecting the reference section areas and then grouping the text lines in such areas into reference strings. We propose a classification model that considers every line in a publication as a potential part of a reference string. By applying line-based conditional random fields rather than constructing the graphical model based on the individual words, dependencies and patterns that are typical in reference sections provide strong features while the overall complexity of the model is reduced.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Keyword Extraction from Documents Using Conditional Random Fields

Keywords are subset of words or phrases from a document that can describe the meaning of the document. Many text mining applications can take advantage from it. Unfortunately, a large portion of documents still do not have keywords assigned. On the other hand, manual assignment of high quality keywords is expensive, time-consuming, and error prone. Therefore, most algorithms and systems aimed t...

متن کامل

Monotone string-to-string translation for NLU and ASR tasks

Monotone string-to-string translation problems have to be tackled as part of almost all stateof-the-art natural language understanding and large vocabulary continuous speech recognition systems. In this work, two such tasks will be investigated in detail and improved using conditional random fields, namely concept tagging and grapheme-to-phoneme conversion. Concept tagging is usually one of the...

متن کامل

Adaptive String Similarity Metrics For Biomedical Reference Resolution

In this paper we present the evaluation of a set of string similarity metrics used to resolve the mapping from strings to concepts in the UMLS MetaThesaurus. String similarity is conceived as a single component in a full Reference Resolution System that would resolve such a mapping. Given this qualification, we obtain positive results achieving 73.6 F-measure (76.1 precision and 71.4 recall) fo...

متن کامل

Heterogeneous Web Data Extraction Algorithm Based On Modified Hidden Conditional Random Fields

As it is of great importance to extract useful information from heterogeneous Web data, in this paper, we propose a novel heterogeneous Web data extraction algorithm using a modified hidden conditional random fields model. Considering the traditional linear chain based conditional random fields can not effectively solve the problem of complex and heterogeneous Web data extraction, we modify the...

متن کامل

ICSI-CRF: The Generation of References to the Main Subject and Named Entities Using Conditional Random Fields

In this paper, we describe our contribution to the Generation Challenge 2009 for the tasks of generating Referring Expressions to the Main Subject References (MSR) and Named Entities Generation (NEG). To generate the referring expressions, we employ the Conditional Random Fields (CRF) learning technique due to the fact that the selection of an expression depends on the selection of the previous...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1705.08154  شماره 

صفحات  -

تاریخ انتشار 2017